Chunking Clinical Text Containing Non-Canonical Language

نویسندگان

  • Aleksandar Savkov
  • John A. Carroll
  • Jackie Cassell
چکیده

Free text notes typed by primary care physicians during patient consultations typically contain highly non-canonical language. Shallow syntactic analysis of free text notes can help to reveal valuable information for the study of disease and treatment. We present an exploratory study into chunking such text using offthe-shelf language processing tools and pre-trained statistical models. We evaluate chunking accuracy with respect to partof-speech tagging quality, choice of chunk representation, and breadth of context features. Our results indicate that narrow context feature windows give the best results, but that chunk representation and minor differences in tagging quality do not have a significant impact on chunking accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Flexible Text Segmentation with Structured Multilabel Classification

Many language processing tasks can be reduced to breaking the text into segments with prescribed properties. Such tasks include sentence splitting, tokenization, named-entity extraction, and chunking. We present a new model of text segmentation based on ideas from multilabel classification. Using this model, we can naturally represent segmentation problems involving overlapping and non-contiguo...

متن کامل

Shallow Parsing and Text Chunking: a View on Underspecification in Syntax

This paper illustrates a technique of shallow parsing named “text chunking” whereby “parse incompleteness” is reinterpreted as “parse underspecification”. A text is chunked into structured units which can be identified with certainty on the basis of available knowledge. The chunking process stops at that level of granularity beyond which the analysis gets undecidable. We argue that a chunked sy...

متن کامل

An Affinity Based Greedy Approach towards Chunking for Indian Languages

A robust chunker can drastically reduce the complexity of parsing of natural language text. Chunking for Indian languages require a novel approach because of the relatively unrestricted order of words within a word group. A computational framework for chunking based on valency theory and feature structures has been described here. The paper also draws an analogy of chunk formation in free word ...

متن کامل

Examining reading fluency in a foreign language: Effects of text segmentation on L2 readers

Grouping words into meaningful chunks is a fundamental process for fluent reading. The present study is an attempt to understand the relationship between chunking and second language (L2) reading fluency. The effects of text segmentation on comprehension, rate, and regression in L2 reading were investigated using a self-paced reading task in a moving-window condition. The participants were inte...

متن کامل

Improving Biomedical Text Categorisation with NLP

Background: Text categorisation has been used in bioinformatics to help identify documents containing protein-protein interactions. Standard text categorisation methods have used the bag-of-words approach with little input from NLP. While this has proved effective in the past, there is some evidence that the techniques are not adequate in some biological domains. Here we examine how chunking, n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014